[low-bit optim] Add coat for float8 optimizer #1231

MirMustafaAli · 2024-11-06T11:33:41Z

This is a Work in Progress PR for #1190.

As a draft PR, I have followed the first piece of advice by @gau-nernst of "extending OptimStateFp8". Have created a separate Dynamic Range Function Instead of creating a different quantize_fp8 method as it will be applied before quantization to achieve larger representation range of float8 datatypes and the class will be storing value k to inverse the it after dequantization.

Requirements:
TBA
Additional Code/logic Added:
TBA
Logic/Code changes to existing codebase:
TBA
Outcome:
TBA
Scope of Usage:
TBA
Example
TBA

Changes:

Dynamic Range Expansion Function: implementation of formula from the paper
Created OptimStateFp8WithDynamicRangeExpansion class by extending OptimStateFp8: by referencing the implementation of the OptimStatefp8. I have only overridden the dequantize method
Implemented aten.copy.default and aten.to_copy.default for OptimStateFp8WithDynamicRangeExpansion:

Benchmarks

Parameters

Parameter	Value
Learning Rate (`lr`)	0.0001
Automatic Mixed Precision (`amp`)	bf16
Seed	42
Model	timm/vit_base_patch16_224.augreg_in21k
Optimizer (`optim`)	AdamWFp8Ao_coat
Compile	False
Profile	False
Project	COAT-benchmarking
Number of Epochs	10
Run Name	AdamWFp8Ao_coat
Full BF16	False
Number of Workers	4
Batch Size	1024
Weight Decay	0
Channels Last	False
Optimizer CPU Offload	None
Cosine LR Scheduler	False
Checkpoint Activations	False

Results

…ass with DRE

pytorch-bot · 2024-11-06T11:33:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1231

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

gau-nernst · 2024-11-06T12:22:23Z

I was thinking you can just add a flag to the current OptimStateFp8, something like dynamic_range_expansion: bool, instead of subclass-ing it.

MirMustafaAli · 2024-11-06T13:21:24Z

I was thinking you can just add a flag to the current OptimStateFp8, something like dynamic_range_expansion: bool, instead of subclass-ing it.

i have added the flag for optimstatefp8. could you verify its right?

gau-nernst · 2024-11-06T13:56:29Z

I think this requires a bit more work. You need to verify that you can create an optimizer with this (add test to https://github.com/pytorch/ao/blob/main/test/prototype/test_low_bit_optim.py) as well do some short training runs for sanity checks (using https://github.com/pytorch/ao/blob/main/benchmarks/benchmark_low_bit_adam.py).

I think for merging the PR, we should wait for the official code release to check numeric against them.

If you don't mind, we can discuss more details in GPU-MODE discord group https://discord.gg/gpumode. Just create a thread under torchao and tag me in (@gau.nernst)

MirMustafaAli · 2024-11-06T13:59:21Z

I understand the situation for merging the PR. Will be glad to work on working on this issue. creating thread in gpumode

…ass with DRE

into add_coat_optimizer

…skip marker to within the function.

* Show a8wxdq load error only when the quant is used * Update Error check

This reverts commit 0bbba59.

gau-nernst · 2024-12-25T00:29:28Z

torchao/prototype/low_bit_optim/subclass_fp8.py

        self.block_size = codes.numel() // scale.numel()
+        self.sqrt_minmax_exp = sqrt_minmax_exp

    def __tensor_flatten__(self):
        return self.tensor_attrs, []


When k and sqrt_minmax_exp is not None, you need to return them here (in __tensor_flatten__()) also.

should i pass them instead of empty array?

The first returned value (currently self.tensor_attrs) is a list of strings containing the names of tensor attributes. In this case, when there is no dynamic range extension, it's just "codes", "scale". However, when there is dynamic range extension, you need to also add "k", "sqrt_minmax_exp". IIRC, when they are None, you are not supposed to include them.

Thank you for explaining it. i have added them. looking forward to your approval

gau-nernst · 2024-12-25T00:38:56Z

dev-requirements.txt

@@ -21,8 +21,7 @@ lm_eval
 diskcache
 pycocotools
 tqdm
-
-# Custom CUDA Extensions
+git+https://github.com/NVlabs/COAT.git#subdirectory=coat/optimizer/kernels # Custom CUDA Extensions


Don't add this. CPU runner will fail to build CUDA extension. We will just test this locally.

…abled

MirMustafaAli added 4 commits November 6, 2024 04:55

added dynamic range expansion

c62fcd3

created optimstate with DRE class

b11b4f6

implement copy_.default for OptimStateFp8WithDynamicRangeExpansion cl…

b887e69

…ass with DRE

implements _to_copy

ab02605

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 6, 2024

MirMustafaAli marked this pull request as draft November 6, 2024 11:35

MirMustafaAli added 5 commits November 6, 2024 07:06

removed implemented classes

43f5c08

dynamic_range_expansion -> apply_dynamic_range_expansion

7d98b15

add DRE flags to class

e03c79c

implementing contraction for dequantize

5faa7de

copy k values as well for copy method

b43c88f

MirMustafaAli added 10 commits November 8, 2024 12:25

added dynamic range expansion

ac41627

created optimstate with DRE class

79c9461

implement copy_.default for OptimStateFp8WithDynamicRangeExpansion cl…

47a7bb0

…ass with DRE

implements _to_copy

a162f94

removed implemented classes

5c1a3f4

dynamic_range_expansion -> apply_dynamic_range_expansion

1458d65

add DRE flags to class

42fbb09

implementing contraction for dequantize

9d1c00c

copy k values as well for copy method

7bc6ea4

Merge branch 'add_coat_optimizer' of https://github.com/MirMustafaAli/ao

7be5a6b

into add_coat_optimizer

MirMustafaAli force-pushed the add_coat_optimizer branch from 4c45349 to 7be5a6b Compare November 8, 2024 18:29

MirMustafaAli added 3 commits November 8, 2024 13:22

combine range_expansion into quantize_fp8 function

70937c8

passing apply_range_expansion to quantize_fp8

3583de7

remove apply_dynamic_range_expansion method

c754893

MirMustafaAli added 11 commits November 12, 2024 21:23

removed optim_addon parameter

41598a0

rename test_optim_addon to test_optim_fp8_coat_smoke

c189dc7

code formatting

6bb49ea

Merge branch 'main' into add_coat_optimizer

6707425

Moved device compatibility check for FP8 optimizer tests from pytest …

b1aea26

…skip marker to within the function.

formatting for ruff check F,I

92ca7b2

removing duplicate

861423d

checking if device is cuda before calling device capability

7661b61

Updating Readme with dynamic range Expansion and Reference to Paper

e1fa683

Merge branch 'main' into add_coat_optimizer

62eac8b

Merge branch 'main' into add_coat_optimizer

1f8f153

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Raise a8wxdq load errors only when quant scheme is used (pytorch#1231)

9bbbc87

* Show a8wxdq load error only when the quant is used * Update Error check

MirMustafaAli added 8 commits December 15, 2024 15:20

Merge branch 'main' into add_coat_optimizer

f9d0aa1

Merge branch 'pytorch:main' into add_coat_optimizer

6eba1d1

removal of block_size parameter

e1ce12d

added geometric mean to expand function

8b8126a

geometeric mean to subclass_fp8

3004a53

merged un staged commit

0bbba59

Revert "merged un staged commit"

a18f58f

This reverts commit 0bbba59.

adding github qoptim library to test expand optim function

6a401b4

gau-nernst reviewed Dec 25, 2024

View reviewed changes

MirMustafaAli added 8 commits December 24, 2024 19:47

testcase for numerical accuracy with coat library

fa40e16

remove print statement

529ff66

removed libraries which failed cpu runner

bb6b707

adding k and sqrt_minmax_exp when dynamic range expansion has been en…

c9af6ce

…abled

ruff formating

f530c42

Merge branch 'pytorch:main' into add_coat_optimizer

024b465

tensor unflatten when dynamic range expansion is true

8de5220

Merge branch 'pytorch:main' into add_coat_optimizer

dc8cc60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low-bit optim] Add coat for float8 optimizer #1231

[low-bit optim] Add coat for float8 optimizer #1231

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

gau-nernst Dec 25, 2024 •

edited

Loading

MirMustafaAli Dec 25, 2024

gau-nernst Dec 25, 2024

MirMustafaAli Dec 25, 2024

gau-nernst Dec 25, 2024

[low-bit optim] Add coat for float8 optimizer #1231

Are you sure you want to change the base?

[low-bit optim] Add coat for float8 optimizer #1231

Conversation

MirMustafaAli commented Nov 6, 2024 • edited Loading

Benchmarks

pytorch-bot bot commented Nov 6, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1231

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024 • edited Loading

gau-nernst Dec 25, 2024 • edited Loading

Choose a reason for hiding this comment

MirMustafaAli Dec 25, 2024

Choose a reason for hiding this comment

gau-nernst Dec 25, 2024

Choose a reason for hiding this comment

MirMustafaAli Dec 25, 2024

Choose a reason for hiding this comment

gau-nernst Dec 25, 2024

Choose a reason for hiding this comment

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

gau-nernst Dec 25, 2024 •

edited

Loading